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Abstract 



Common wisdom has it that smah distinctions in the probabihties (parameters) quan- 
tifying a belief network do not matter much for the results of probabilistic queries. Yet, 
one can develop realistic scenarios under which small variations in network parameters can 
lead to significant changes in computed queries. A pending theoretical question is then to 
analytically characterize parameter changes that do or do not matter. In this paper, we 
study the sensitivity of probabilistic queries to changes in network parameters and prove 
some tight boimds on the impact that such parameters can have on queries. Our analytic 
results pinpoint some interesting situations under which parameter changes do or do not 
matter. These results are important for knowledge engineers as they help them identify 
influential network parameters. They also help explain some of the previous experimental 
results and observations with regards to network robustness against parameter changes. 

1. Introduction 

A belief network is a compact representation of a probability distribution (Pearl, 1988; 
Jensen, 2001). It consists of two parts, one qualitative and the other quantitative. The 
qualitative part of a belief network (called its structure) is a directed acyclic graph in 
which nodes represent domain variables and edges represent direct influences between these 
variables. The quantitative part of a belief network is a set of conditional probability tables 
(CPTs) that quantify our beliefs in such influences. Figure 1 depicts the structure of a 
belief network and Figure 2 depicts its CPTs.^ 

Automated reasoning systems based on belief networks have become quite popular re- 
cently as they have enjoyed much success in a number of real-world applications. Central to 
the development of such systems is the construction of a belief network (hence, a probabil- 
ity distribution) that faithfully represents the domain of interest. Although the automatic 
synthesis of belief networks — based on design information in certain applications and based 
on learning techniques in others — has been drawing a lot of attention recently, mainstream 
methods for constructing such networks continue to be based on traditional knowledge en- 
gineering (KE) sessions involving domain experts. One of the central issues that arise in 
such KE sessions is the assessment of impact that changes in network parameters may have 
on probabilistic queries of interest. 

Consider for example the following common method for constructing belief networks in 
medical diagnosis applications (Coupe, Peek, Ottenkamp, & Habbema, 1999). First, the 

1. This specific network and its CPTs are distributed with the evaluation version of the commercial HUGIN 
system at http://www.hugin.com/. 
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Figure 1: A belief network structure. 
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Figure 2: The CPTs of the behef network shown in Figure 1. 



network structure is developed. Next, parameters are estimated by non-experts using a 
combination of statistical data and qualitative influences available from textbook materials. 
Finally, medical experts are brought in to evaluate the network and fine-tune its parameters. 
One method of evaluation is to pose diagnostic scenarios to the network, and compare the 
results of such queries to those expected by the experts. For example, given some set of 
symptoms e, and two potential diagnoses y and z, the network may give us the conclusion 
that Pr[y \ e)/Pr{z | e) = 2, while a domain expert may believe that the ratio should 
be no less than 4. Assuming that the network structure is correct, a central question is 
then: which network parameters should be changed to give us the correct ratio, and by how 
much? 

To automate the task of identifying such parameter changes, we have recently devel- 
oped a belief network tool, called SamIam (Sensitivity Analysis, Modelling, Inference And 
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Figure 3: A screen shot of SamIam performing sensitivity analysis on the behef network 
shown in Figure 1. 



Morc)^. One of its feature is sensitivity analysis, which allows domain experts to fine-tune 
network parameters in order to enforce constraints on the results of certain queries. Users 
can specify the constraint that they want to enforce, and SamIam will automatically de- 
cide whether a given parameter is relevant to this constraint, and if it is, will compute the 
minimum amount of change to that parameter which is needed to enforce the constraint. 
The technical details of our approach to sensitivity analysis are the subject of Section 2. 

As we experimented with SamIam, we ran into scenarios that we found to be surprising 
at first glance. Specifically, there were many occasions in which queries would be quite 
sensitive to small variations in certain network parameters. Consider the scenario in Figure 3 
for one example, which corresponds to the network detailed in Figures 1 and 2. Here, we 
have evidence e = report, smoke: people are reported to be evacuating a building, but there 
is no evidence for any smoke. This evidence should make tampering more likely than fire, 
and the given belief network does indeed reflect this with Pr{tampering | e) = .50 and 
Pr{fire \ e) = .03. We wanted, however, the probability of tampering to be no less than .65. 
Hence, we asked SamIam to identify parameter changes that can enforce this constraint, 
and it made two recommendations: 

1. either decrease the probability of a false report, Pr{report \ leaving), from its current 
value of .01 to < .0047, 

2. SamIam is developed by the UCLA Automated Reasoning Group. Its web page is at 
http : //reasoning. cs .ucla. edu/. 
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2. or increase the prior probability of tampering from its current value of .02 to > .036. 

Therefore, the distinctions between .02 and .036, and the one between .01 and .0047, do 

really matter in this case as each induces an absolute change of .15 on the probabilistic query 
of interest. Note also that implicit in SamIam's recommendations is that the parameters 
of variables Fire, Smoke, Leaving, and Alarm are irrelevant to enforcing this constraint, i.e. 
no matter how much we change any of these parameters, we would not be able to enforce 
our desired constraint. 

This example shows that the absolute change in a query can be much larger than the 
absolute change in the corresponding parameters. Later, we will show an example where 
an infinitesimal change to a network parameter leads to a change of .5 to a corresponding 
query. We also show examples in which the relative change in the probability of a query is 
larger than the corresponding relative change in a network parameter. One wonders then 
whether there is a different method for measuring probabilistic change (other than absolute 
or relative), which allows one to non-trivially bound the change in a probabilistic query in 
terms of the corresponding change in a network parameter. 

To answer this and related questions, we conduct in Section 3 an analytic study of the 
partial derivative of a probabilistic query Pr{y \ e) with respect to some network parameter 
Our study leads us to three main results: 

1. a bound on the derivative in terms of Pr{y \ e) and Pr{x \ u) only, which is indepen- 
dent of any other aspect of the given belief network; 

2. a bound on the sensitivity of queries to infinitesimal changes in network parameters; 

3. a bound on the sensitivity of queries to arbitrary changes in network parameters. 

The last bound in particular shows that the amount of change in a probabilistic query can 
be bounded in terms of the amount of change in a network parameter, as long as change is 
understood to be the relative change in odds. This result has a number of practical impli- 
cations. First, it can relieve experts from having to be too precise when specifying certain 
parameters subjectively. Next, it can be important for approximate inference algorithms 
that pre-process network parameters to eliminate small distinctions in such parameters, in 
order to increase the efficiency of inference (Poole, 1998). Finally, it can be used to show 
that automated reasoning systems based on belief networks are robust and, hence, suitable 
for real- world applications (Pradhan, Henrion, Provan, Del Favero, & Huang, 1996). 

Section 4 is indeed dedicated to exploring the implications of the above bounds, where 
we provide an analytic explanation of why certain parameter changes don't matter. We 
finally close in Section 5 with some concluding remarks. Proofs of all theorems are given in 
Appendix A. 

2. The Tuning of Network Parameters 

We report in this section on a tool that we have been developing, called SamIam, for fine- 
tuning network parameters (Laskey, 1995; Castillo, Gutierrez, &; Hadi, 1997; Jensen, 1999; 
Kjaerulff Sz van der Gaag, 2000; Darwiche, 2000). Given a belief network, some evidence 
e, which is an instantiation of variables E in the belief network, and two events y and z of 
variables Y and Z respectively, where Y,Z^'E, our tool can efficiently identify parameter 
changes needed to enforce the following types of constraints: 
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Difference: Pr{y \ e) — Pr{z | e) > e; 

Ratio: Pr{y \ e)/Pr{z | e) > e. 

These two constraints often arise when we debug behef networks. For example, we can 
make event y more Ukely than event z, given evidence e, by specifying the constraint, 
Pr{y I e) — Pr{z | e) > 0, or we can make event y at least twice as likely as event z, given 
evidence e, by specifying the constraint, Pr{y \ e)/Pr(z | e) > 2. Wc will discuss next how 
one would enforce the two constraints, but we need to settle some notational conventions 
and technical preliminaries first. 

Variables are denoted by upper-case letters (A) and their values by lower-case letters (a) . 
Sets of variables are denoted by bold-face upper-case letters (A) and their instantiations 
are denoted by bold-face lower-case letters (a). For a variable A with values true and false, 
we use a to denote A = true and a to denote A = false. The CPT for variable X with 
parents U defines a set of conditional probabilities of the form Pr{x \ u), where x is a value 
of variable X , u is an instantiation of parents U, and Pr(.x | u) is a probability known as a 
network parameter and denoted by 6x\u- We finally recall a basic fact about belief networks. 
The probability of some instantiation x of all network variables X equals the product 
of all network parameters that are consistent with that instantiation. For example, the 
probability of instantiation fire, tampering, smoke, alarm, leaving, report in Figure 1 equals 
.01 X .98 X .9 X .99 x .12 x .01, which is the product of network parameters (from Figure 2) 
that are consistent with this instantiation. 



2.1 Binary Variables 

We first consider the parameters of a binary variable X, with two values x and x and, 
hence, two parameters ^^^ju ^^'^ %|u each parent instantiation u. We assume that for 
each variable X and parent instantiation u we have a meta parameter Ta.|u, such that 
0x\u = Tx\u and = 1 — rj.|u- Therefore, our goal is then to determine the amount of 
change to the meta parameter t,j.^^ which would lead to a simultaneous change in both 
and 9^u- We use tlic meta parameter Tx\u because it is not meaningful to change only 6^^^ 
or without changing the other since Ox\u + dx\u = 1- 

First we observe that the probability of an instantiation e, Pr{e), is a linear function 
in any network parameter in a belief network (Russell, Binder, Roller, &: Kanazawa, 
1995; Castillo et al., 1997). In fact, the probability is linear in any meta parameter t^^^- 

Theorem 2.1 The derivative of Pr{e) with respect to the meta parameter Tx\u is given by: 

dPr{e) Pr{e,x,u) Pr{e,x,u) 

^ ' x\\i ^x\vi ^a;|u 

when 9x\u 7^ and 9x\^ 7^ 0.^ We will designate the derivative as constant a^- 

In Theorem 2.1, ctg = Pr{e,x,\i) /9x\u ~ P^{^i^j'^)/9x\u is a constant in terms of both 
9x\u and %|u (and consequently, t^^u) since Pr{e,x,u) = Kx9x\u aiid Pr{e,x,u) = K^^j^, 

3. If either of the previous parameters is zero, we can use the differential approach by Darwiche (2000) to 
compute the derivative directly. 
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where = Pr{u)Pr{e \ x, u) and = Pr{u)Pr{e \ x, u) are constants in terms of both 
6x\u and 9x\u- By substituting y, e and z, e for e in Theorem 2.1, we get: 

_ dPrjy, e) _ Pr{y,e,x,u) _ Pr{y,e,x, u) 

_ dPr{z,e) _ Pr{z,e,x,u) Pr{z,e,x,u) . 

'^'xlu "x\u "x\u 

Now, if we want to enforce the Difference constraint, Pr{y \ e) — Pr{z | e) > e, it 
suffices to ensure that Pr{y,e) — Pr(z,e) > ePr{e). Suppose that the previous constraint 
docs not hold, and we wish to estabhsh it by applying a change of 5 to the meta parameter 
Tx\u- Such a change leads to a change of a^S in Pr{e). It also changes Pr{y, e) and Pr{z, e) 
by oiy^ed and a^^e*^, respectively. Hence, to enforce the Difference constraint, we need to 
solve for 5 in the following inequality: 

[Pr{y, e) + Uy^J] - [Pr{z, e) + a^^J] > €[Pr(e) + a J]. 

Rearranging the terms, we get the following result. 

Corollary 2.1 To satisfy the Difference constraint, we need to change the meta param- 
eter r^|u by d, such that: 

Pr{y, e) - Pr{z, e) - ePr(e) > 5[-ay^e + a^^e + eoe], 

where the a constants are defined by Equations 1, 2 and 3. 

We can similarly solve for parameter changes 5 that enforce the Ratio constraint, 
Pr{y I e)/Pr{z | e) > e, in the following inequality: 

[Pr{y, e) + ay,J]/[Pr{z, e) + a^^J] > e. 

Rearranging the terms, we get the following result. 

Corollary 2.2 To satisfy the Ratio constraint, we need to change the meta parameter Tx\u 
by S, such that: 

Pr{y, e) - ePr{z, e) > S[-ay^e + ^az,e], 
where the a constants are defined by Equations 2 and 3. 

For both the Difference and Ratio constraints, the solution of 6, if any, is always in 
one of two forms: 

• 6 < q, for some computed < 0, in which case the new value of meta parameter r^-ju 
must be in the interval [0,p + g]. 

• 6 > q, for some computed q > 0, in which case the new value of meta parameter Tx\u 
must be in the interval [p + g, 1]. 
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Note that p is the current value of meta parameter T,J.^■^^ (before the change). For many 
parameters, these intervals are empty and, therefore, there is no way we can change these 
meta parameters to enforce the constraint. 

The question now is how to solve these inequalities, efficiently, and for all meta parame- 
ters. Note that there may be more than one possible parameter change that would enforce 
the given constraint, so we need to identify all such changes. With either Corollary 2.1 
or 2.2, we can easily solve for the amount of change needed, 6, once we know the following 
probabilities: Pr(e), Pr{y,e), Pr{z,e), Pr(e,x, u), Pr{e,x,u), Pr{y,e,x,u.), Pr{y,e,x,u), 
Pr{z, e, X, u), and Pr{z, e, x, u). This leads to the following complexity of our technique. 

Corollary 2.3 If we have an algorithm that can compute Pr{i,x,u), for a given instanti- 
ation i, and all family instantiations x,u of every variable X, in time 0{f), then we can 
solve for Corollaries 2.1 and 2.2 for all parameters in time 0(f). We do this by running 
the algorithm three times, once with i = e, and then with i = y, e, and finally with i = z,e. 

Recall that the family of a variable X is the set containing X, and its parents U in the 
belief network. 

The join-tree algorithm (Jensen, Lauritzen, Sz Olesen, 1990) and the differential ap- 
proach (Darwiche, 2000) can both compute Pr{i, x,u), for a given instantiation i and all 
family instantiations x, u of every variable X in 0{ncxpw) time. Here, n is the number of 
variables in the belief network, and w is the width of a given elimination order. SamIam 
uses the differential approach, and thus its running time to identify all possible parameter 
changes in a network is also 0{nexpw). Note that this is also the time needed to answer 
one of the simplest queries, that of computing the probability of evidence e. 

2.2 Multi- Valued Variables 

Our results can be easily extended to multi- valued variables, as long as we assume a model 
for changing co- varying parameters when one of them changes (Darwiche, 2000; Kjaerulff Sz 
van der Gaag, 2000). After the parameter changes, we need to use a scheme to change 
the other parameters, 6xi\u ^oi all Xi x, in order to ensure the sum-to-one constraint. 
The most common way to do this is to use the proportional scheme. In this scheme, 
we change the other parameters so that the ratios between them remain the same. For 
example, suppose we have three parameters ^^^^ju = -6, ^X2|u = -3 and 0x-i\u = After 
9xj^\u changes to .8, the other two parameter values will be changed to 6j..2\u — •3(.2/.4) = .15 
and ^a;3|u = •l(-2/.4) = .05 accordingly. We now define the meta parameter Tx\u such that it 
simultaneously changes all parameters according to the proportional scheme. We can then 
obtain a linear relation between Pr{e) and rj.|u, and the partial derivative is given by: 

dPrje) ^ Pr{e, x, u) _ Exj^x Pr{e,Xi,u) 

This is very similar to the result in Theorem 2.1, in the way that we have grouped all the 
values Xi ^ X into the value x. We can then use Corollaries 2.1 and 2.2 to solve for the 

Difference and Ratio constraints. 

We now present another example to illustrate how the results above are used in practice. 
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Example 2.1 Consider again the network in Figure 3. Here, we set the evidence such that 

we have sm,oke; hut no report of people evacuating the building, i.e. e = smoke, report. We 
then got the posteriors Pr{fire \ e) = .25 and Pr{tampering | e) = .02. We thought in this 
case that the posterior on fire should be no less than .5 and asked SamIam to recommend 
the necessary changes to enforce the constraint, Pr{fire \ e) — Pr{fire | e) > 0. There were 
fi.ve recommendations in this case, three of which could be ruled out based on qualitative 
considerations: 

1. increase the prior on fire to > .03 (from .01); 

2. increase the prior on tampering to > .80 (from .02 ); 

3. decrease Pr{smoke \ fire) to < .003 (from .01 ); 

4- increase Pr{leaving \ alarm) to > .923 (from .001^; 

5. increase Pr{report \ leaving) to > .776 (from .01). 

Clearly, the only sensible change here is either to increase the prior on fire, or to decrease 
the probability of having smoke without a fire. 

This example and other similar ones suggest that identifying such parameter changes 
and their magnitudes is inevitable for developing a faithful belief network, yet it is not trivial 
for experts to accomplish this task by visual inspection of the belief network, often due to 
its size and complexity. Sensitivity analysis tools such as SamIam can help facilitate this 
by identifying important parameters that need to be fine-tuned in order to satisfy certain 
constraints. Of course, if we arc given multiple constraints, we need to be cautious when 
implementing a recommendation made by SamIam due to one constraint, because this may 
result in violating other constraints. In this case, the parameter changes recommended 
by SamIam should be used to help experts in focusing their attention on the relevant 
parameters. 

Moreover, the previous examples illustrate the need to develop more analytic tools to 
understand and explain the sensitivity of queries to certain parameter changes. There is 
also a need to reconcile the sensitivities exhibited by our examples with previous experimen- 
tal studies demonstrating the robustness of probabilistic queries against small parameter 
changes in certain application areas, such as diagnosis (Pradhan et al., 1996). We address 
these particular questions in the next two sections. 

3. The Sensitivity of Probabilistic Queries to Pcirameters Changes 

Our starting point in understanding the sensitivity of a query Pr{y \ e) to changes in a 
meta parameter r^[u is to analyze the derivative dPr{y \ e)/c)rj,|ii. In our analysis, we 
assume that X is binary, but Y and all other variables in the network can be multi- valued. 
The following theorem provides a simple bound on this derivative, in terms of Pr{y \ e) 
and Pr{x \ u) only. We then use this simple bound to study the effect of changes to meta 
parameters on probabilistic queries. 
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Figure 4: The plot of the upper bound on the partial derivative dPr{y \ G)/dT^\^, as given 
in Theorem 3.1, against Pr{x \ u) and Pr{y \ e). 



Theorem 3.1 If X is a binary variable in a belief network, then:^ 

dPr{y I e) 

The bound in Theorem 3.1 is tight, and we will show later an example for which the 
derivative assumes the above bound exactly. The main point to note about this bound is 
that it is independent of any given belief network.^ 

The plot of this bound against Pr{x \ u) and Pr{y \ e) is shown in Figure 4. A number 
of observations are in order about this plot: 

• For extreme values of Pr{x \ u), the bound approaches infinity, and thus a small 
absolute change in the meta parameter Tx\u can have a big impact on the query 
Pr{y I e). 

• On the other hand, the bound approaches for extreme values of the query Pr[y \ e). 
Therefore, a small absolute change in the meta parameter r^|u will have a small effect 
on the absolute change in the query. 

One of the implications of this result is that if we have a belief network where queries 
of interest Pr{y \ e) have extreme values, then such queries will be robust against small 
changes in network parameters. This of course assumes that robustness is understood to 

4. This theorem and all results that follow requires that Tx\m 7^ and r^^iu 7^ 1, since we can only use the 
expression in Equation 2.1 under these conditions. 

5. Note that we have an exact closed form for the derivative dPr{y \ B)/dT^\u (Darwiche, 2000; Greiner, 
Grove, & Schuurmans, 1997), but that form includes terms which are specific to the given belief network. 



^ Prjy \ e){l - Prjy \ e)) 
~ Pr{x I u)(l — Pr{x \ u)) ' 
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Figure 5: The network used in Example 3.1. 



be a small change in the absolute value of the given query. Interestingly enough, if y is a 
disease which is diagnosed by finding e — that is, the probability Pr{y \ e) is quite high — 
then it is not surprising that such queries would be robust against small perturbations to 
network parameters. This seems to explain some of the results by Pradhan et al. (1996), 
where robustness have been confirmed for queries with Pr(y \ e) > .9. 

Another implication of the above result is that one has to be careful when changing 
parameters that arc extreme. Such parameters are potentially very influential and one 
must handle them with care. 

Therefore, the worst situation from a robustness viewpoint materializes if one has ex- 
treme parameters with non-extreme queries. In such a case, the queries can be very sensitive 
to small variations in the parameters. 

Example 3.1 Consider the network structure in Figure 5. We have two binary nodes, X 

and Y with respective parameters O^^Ox a,nd 0y,6y. We assume that E is a deterministic 
binary node where the value of E is e iff X = Y. This dictates the following CPT for E: 
Pr{e \ x,y) = 1, Pr(e \x,y) = I, Pr(e \ x,y) = and Pr{e \x,y) = 0. The conditional 
probability Pr{y \ e) can be expressed using the root parameters Ox and Oy as: 

^x&y + dxdy 

Since dOx/drx = 1 and dOx/drx = —1, the derivative of Pr{y \ e) with respect to the meta 
parameter Tx is given by: 

dPrjy I e) ^ jexOy + 9x9y)ey - ejyjOy - By) 
BTx [exOy + OxOyf 

This is equal to the upper bound given in Theorem 3.1: 

Pr{y\e){\-Pr{y\e)) _ {9x9y){ex9y) 



Pr(x)(l - Prix)) OxOxiOxOy + My)' 

_ 9y9y 



{9x9y + 9x9y 



l2- 



Now, if we set Ox = 9y, the derivative becomes: 

dPr{y I e) 1 
Otx " 4^' 
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and as 9x (or Ox) approaches 0, the derivative approaches infinity. Finally, if we set Ox = 
Oy = €, we have Pr{y \ e) = .5, hut if we keep Oy and Oy constant and change Tx from e to 
0, we get the new result Pr{y | e) = 0. 

Example 3.1 then illustrates three points. First, it shows that the bound in Theorem 3.1 
is tight, i.e. we can construct a belief network that assumes the bound. Second, it gives an 
example network for which the derivative dPr{y \ e) / dTx\u tends to infinity, and therefore we 
cannot bound the derivative by any constant. Finally, it shows that an infinitesimal absolute 
change in a meta parameter (changing Tx from e to 0) can induce a non-infinitesimal absolute 
change in some query {Pr(y \ e) changes from .5 to 0). The following theorem, however, 
shows that this is not possible if we consider a relative notion of change. 

Theorem 3.2 Assume that Tx\u < -5 without loss of generality.^ Suppose that Atx\u is an 
infinitesimal change applied to the meta parameter Tx\u, leading to a change of APr{y \ e) 
to the query Pr{y \ e). We then have: 

APr{y I e) 
Pr{y I e) 

For a function f{x), the quantity: 

{m-f{xo))/f{xo) ^ 

(x-xo)^O {x-Xo)/xo 

is typically known as the sensitivity of f to x at xq. Therefore, Theorem 3.2 shows that the 
sensitivity of Pr{y \ e) to Tx\u is bounded. 

As an example application of Theorem 3.2, consider Example 3.1 again. The change 
of Tx from e to amounts to a relative change | — e/e| = 1. The corresponding change of 
Pr{y I e) from .5 to amounts to a relative change of | — .5/.5| = 1. Hence, the relative 
change in the query is not as great from this viewpoint.^ 

The relative change in Pr(y \ e) may be greater than double the relative change in Tx\u 
for non-infinitesimal changes because the derivative dPr{y \ e)/dTx\u depends on the value 
of Tx\u (Darwiche, 2000; Jensen, 1999). Going back to Example 3.1, if we set Ox = -5 and 
Oy = .01, we obtain the result Pr{y \ e) = .01. If we now increase Tx to .6, a relative change 
of 20%, we get the new result Pr{y \ e) = 0.0149, a relative change of 49%, which is more 
than double of the relative change in Tx. 

The question now is: Suppose that we change a meta parameter r^j^ by an arbitrary 
amount (not an infinitesimal amount), what can we say about the corresponding change in 
the query Pr{y \ e)? We have the following result. 

Theorem 3.3 Let 0{x \ u) denote the odds of x given n: 0{x \ u) = Pr{x \ u)/(l — Pr(x | 
u)), and let 0{y \ e) denote the odds of y given e: 0{y \ e) = Pr{y \ e)/(l — Pr{y \ e)). 
Let 0'{x I u) and 0'{y \ e) denote these odds after having applied an arbitrary change to 

6. For a binary variable X, if r^^iu > .5, we can instead choose the meta parameter r^iu without loss of 
generality. 

7. If we consider the meta parameter = 1 — e instead, the relative change in will then amount to 
e/(l — e). But Theorem 3.2 will not be applicable in this case (assuming that e is close to 0) since the 
theorem requires that the chosen meta parameter be no greater than .5. 



< 2 
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the meta parameter r^|u where X is a binary variable in a belief network. If the change is 
positive, then: 

0(.T I u) 0'{y\e) 0'(x|u)_ 
0'{x I u) - 0{y I e) " 0{x \ u) ' 



or if it is negative, then: 



0'{x I u) ^ 0'{y I e) ^ 0{x \ u) 



0(a; I u) - 0{y \ e) " 0'{x \ u) ' 
Combining both results, we have: 

I ln(0'(y I e)) - ln{0{y | e))| < | ln(0'(x | u)) - ln(0(x | u))|. 

Theorem 3.3 means that the relative change in the odds of y given e is bounded by the 
relative change in the odds of x given u, if X is a binary variable.^ Note that the result 
makes no assumptions whatsoever about the structure of the given belief network. 

To illustrate this theorem, we go back to Example 2.1. Wc intend to increase the 
posterior Pr(fire \ e) from .25 to .5, for e = smoke, report. The log-odds change for the 
query is thus Alo{Pr{y \ e)) = \ln{0'{y \ e)) — ln(0(y | e))| = 1.1. There were five 
recommendations made by SamIam and wc can calculate the log-odds change, AZo(r^|u) = 
I ln{0'{x I u)) — ln(0(x | u))| for each parameter change: 

1. increase the prior on fire to > .03 (from .01): AZo(rj.|u) = 1.1; 

2. increase the prior on tampering to > .80 (from .02): AIo{tx\u) = 5.3; 

3. decrease Pr{smoke \ fire) to < .003 (from .01): AIo{t^^^) = 1.2; 



4. increase Pr{leaving \ alarm) to > .923 (from .001): AIo{t^^^) = 9.4; 

5. increase Pr (report \ leaving) to > .776 (from .01): AIo{t,j.^^) = 5.8. 

Therefore, we can see that all the recommended parameter changes satisfy Theorem 3.3, 
i.e. the log-odds change of the query is bounded by the log-odds change of the parameter. 

An interesting special case of Theorem 3.3 is when X is a root node and X = Y. Prom 
basic probability theory, we have: 

0{x I e) = 0(x)^^4^- 
^ ' ^ ^ ' Pr{e I x) 

As the ratio Pr{e \ x)/Pr{e \ x) is independent of Pr{x), the ratio 0{x \ e)/0{x) is also 
independent of this prior. Therefore, we can conclude that: 

O'jx I e) ^ (Axl 

0{x I e) 0{x) ■ ^ ' 

This means we can find the exact amount of change needed for a meta parameter Tx in 
order to induce a particular change on the query Pr{x \ e). There is no need to use the 
more expensive technique of Section 2 in this case. 

8. We recently expanded our results to multi-valued variables, where we arbitrarily change parameters 
^j;|u to new values 9'^^^, for all values x. The resulting bound is: \hi{0'{y \ e)) — ln(0(j/ | e))| < 

ln(maxa; 0'^,^/6x\m) — ln(mina; 0'^,^/9x\m) (Chan & Darwiche, 2002). 



276 



When do Numbers Really Matter? 



Example 3.2 Consider the network in Figure 3. Suppose that e = report, smoke. Cur- 
rently, Pr (tampering) = .02 and Pr (tampering \ e) = .50. We wish to increase the condi- 
tional probability to .65. We can compute the new prior probability Pr' (tampering) using 
Equation 4' 

.65/. 35 Pr' (tampering) / (1 — Pr' (tampering)) 
.50/.50 ~ .02/.98 ' 

giving us Pr' (tampering) = .036, which is equal to the result we obtained using SamIam 
in Section 1. Both the changes to Pr(tampering) and Pr(tampering \ e) bring a log-odds 
difference o/ .616. 

Theorem 3.3 has a number of imphcations. First, given a particular query Pr(y \ e) and 
a meta parameter Tx\u, it can be used to bound the effect that a change in t^^^ will have on 
the query Pr(y \ e). Going back to Example 3.2, we may wish to know what is the impact 
on other conditional probabilities if we apply the change making Pr' (tampering) = .036. 
The log-odds changes for all conditional probabilities in the network will be bounded by 
.616. For example, currently Pr(fire | e) = .029. Using Theorem 3.3, we can find the range 
of the new conditional probability value Pr'(fire \ e): 



Pr'(fire\e) \ _^^f-029 



I - Pr' (fire \ e) V-971 



< .616, 



giving us the range .016 < Pr'(fire \ e) < .053. The exact value of Pr'(fire \ e), obtained 
by inference, is .021, which is within the computed bounds. 

Second, Theorem 3.3 can be used to efficiently approximate solutions to the Difference 
and Ratio problems we discussed in Section 2. That is, given a desirable change in the 
value of query Pr(y \ e), we can use Theorem 3.3 to immediately compute a lower bound 
on the minimum change to meta parameter r^-ju needed to induce the change. This method 
can be applied in constant time and can serve as a preliminary recommendation, as the 
method proposed in Section 2 is much more expensive computationally. 

Third, suppose that SamIam was used to recommend parameter changes that would 
induce a desirable change on a given query. Suppose further that SamIam returned a 
number of such changes, each of which is capable of inducing the necessary change. The 
question is: which one of these changes should we adopt? The main principle applied 
in these situations is to adopt a "minimal" change. But what is minimal in this case? As 
Theorem 3.3 reveals, a notion of minimality which is based on the amount of absolute change 
can be very misleading. Instead, it suggests that one adopts the change that minimizes the 
relative change in the odds, as other queries can be shown to be robust against such a 
change in a precise sense. 

For example, we are given two parameter changes, one from .1 to .15, and another from 
.4 to .45. Both these changes give us the same absolute change of .05. However, the first 
change has an log-odds change of .462, while the second one has an log-odds change of .205. 
Therefore, two parameter changes that give us the same absolute change can have different 
amounts of log-odds change. 

On the other hand, two parameter changes that give us the same relative change can 
also have different amounts of log-odds change. For example, we are given two parameter 
changes, one from .1 to .2, and another from .2 to .4. Both these changes double the original 
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parameter value. However, the first change has a log-odds change of .811, while the second 
one has a log-odds change of .981. 

Finally, the result can be used to obtain a better intuitive understanding of parameter 
changes that do or do not matter, a topic which we will discuss in the next section. 



4. Changes that (Don't) Matter 

We now return to a central question: When do changes in network parameters matter and 
when do they not matter? As we mentioned earlier, there have been experimental studies 
investigating the robustness of belief networks against parameter changes (Pradhan et al., 
1996). But we have also shown very simple and intuitive examples where networks can 
be very sensitive to small parameter changes. This calls for a better understanding of the 
effect of parameter changes on queries, so one can intuitively sort out situations in which 
such changes do or do not matter. Our goal in this section is to further develop such 
an understanding by looking more closely into some of the implications of Theorem 3.3. 
We start first by highlighting the difference between this theorem and previous results on 
sensitivity analysis. 



4.1 Network-Specific Sensitivity Analysis 

One of the main differences between our results and other sensitivity analysis approaches 
is that we do not need to know the belief network, and hence, do not need to perform 
inference. To clarify this difference, we compare it with the sensitivity function approach 
(van der Gaag & Renooij, 2001), which computes the sensitivity function that relates a 
query, f{x), and a parameter, x, in the form: 

a-x + b 
c - X + a 



where a, b, c, d are constants that depend on the given network and are computed by 
performing inference as suggested by van der Gaag and Renooij (2001). 

Going back to Example 2.1, we can express the query Pr{fire \ smoke, report) as a 
function of the parameter x = Pr{smoke \ fire). The function is given by: 

0.003165 
~ 0.9684 + 0.003165' 

and we plot this function in Figure 6. We can see that at the current parameter value .01, 
the query value is .25, but if we decrease it to .003, the query value increases to .5, which 
is one of the suggested parameter changes by SamIam. 

However, we can find a bound on the relations between the query and the parameter 
using Theorem 3.3, without doing inference on the network (and without knowing the 
network). For example, by changing the current parameter value from .01 to .003, the new 
query value will be within the bounds of .09 and .53. On the other hand, if we want the 
query value to increase to .5, we have to at least decrease the parameter value from .01 to 
.003, or increase it to .03. 
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Figure 6: The plot of the query Pr{fire \ smoke, report) against the parameter Pr{smoke \ 
fire). The second graph shows a magnification of the first graph for the region 
where Pr{smoke \ fire) is between and .02. 



4.2 Assuring Query Robustness 

One of the important issues we have yet to settle is: "What does it mean for a parameter 
change to not matter?" One can think of at least three definitions. First, the absolute 
change in the probability Pr(y \ e) is small. Second, the relative change in the probability 
Pr{y I e) is small. Third, relative change in the odds 0{y \ e) is small. The first notion is 
the one most prevalent in the literature, so we shall adopt it in the rest of this section. 

Suppose we have a belief network for a diagnostic application and suppose we are con- 
cerned about the robustness of the query Pr{y | e) with respect to changes in network 
parameters. In this application, y is a particular disease and e is a particular finding which 
predicts the disease, with Pr{y | e) = .9. Let us define robustness in this case to be an 
absolute change of no more than .05 to the given query. Now, let X be a binary variable in 
the network and let us ask: What kind of changes to the parameters on X are guaranteed 
to keep the query within the desirable range? We can use Theorem 3.3 easily to answer this 
question. First, if we are changing a parameter by 6, and if we want the value of the query 
to remain < .95, we must ensure that: 

\ln{{p + 6)/{l-p-6)) -\n{p/{l-p))\ < | ln(.95/.05) - ln(.9/.l)| = .7472, 

where p is the current value of the parameter. Similarly, if we want to ensure that the query 
remains > .85, we want to ensure that: 

\ln{{p + S)/{l-p-6))-ln{p/{l-p))\ < |ln(.85/.15) -ln(.9/.l)| = .4626. 

Figure 7 plots the permissible change 6 as a function of p, the current value of the 
parameter. The main point to observe here is that the amount of permissible change 
depends on the current value of p, with smaller changes allowed for extreme values of p. 
It is also interesting to note that it is easier to guarantee the query to stay < .95 than to 
guarantee that it stays > .85. In general, it is more likely for a parameter change to reduce 
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5 




Figure 7: The amount of parameter change 6 that would guarantee the query Pr{y | e) = .9 
to stay within the interval [.85, .95], as a function of the current parameter value 
p. The outer envelope guarantees the query to remain < .95, while the inner 
envelope guarantees the query to remain > .85. 




Figure 8: The amount of parameter change 6 that would guarantee the query Pr{y \ e) = .6 
to stay within the interval [.55, .65], as a function of the current parameter value 
p. The outer envelope guarantees the query to remain < .65, while the inner 
envelope guarantees the query to stay in > .55. 



the value of a query which is close to 1 (and to increase the value of a query which is close 
to 0). Finally, if we are increasing the parameter, then a parameter value close to .4 will 
allow the biggest absolute change. But if we are decreasing the parameter, then a value 
close to .6 will allow the biggest absolute change. 

Now let us repeat the same exercise but assuming that the initial value of the query 
is Pr{y I e) = .6, yet insisting on the same measure of robustness. Figure 8 plots the 
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Figure 9: The plot of the log-odd difference, Alo = \ ln{0'{x \ u)) — ln(0(x | u))[, against 
Pr{x I u) and Pr'{x \ u). 




Figure 10: The plots of the log-odd difference, Alo = \ ln(0'(x | u)) — ln{0{x \ u))|, against 
the new parameter value p' = Pr'{x \ u). The figures correspond to different 
initial values of the parameter, p = Pr{x \ u) = .1, .5, .9, respectively. 



permissible changes 5 as a function of p, the current value of the parameter. Again, the 
amount of permissible change becomes smaller as the probability p approaches or 1 . The 
other main point to emphasize is that the permissible changes are now much smaller than 
in the previous example, since the initial value of the query is not as extreme. Therefore, 
this query is much less robust than the previous one. 

More generally, Figure 9 plots the log-odds difference, \ \n.{0'{x \ u)) — ln(0(x | u))|, 
against Pr{x \ u) = p and Pr'{x \ u) = p + 5, and Figure 10 shows cross-sections of Figure 9 
for three different values of p. Again, the plots explain analytically why we can afford more 
absolute changes to non-extreme probabilities (Pradhan et al., 1996; Poole, 1998). 
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From Figure 10, we also notice that although the plot is symmetric for p = .5, it is not 
for both p = .1 and p = .9, i.e. absolute changes of Ap and —Ap give us different amounts 
of log-odds change. For example, changing the parameter from .1 to .05 give us a larger 
log-odds change than changing the parameter from .1 to .15. We also notice that the plots 
for p = .1 and p = .9 are mirror images of each other. Therefore, the log-odds change is the 
same for complementary parameter changes on ^^-i^ and 6^^^. 

We close this section by emphasizing that the above figures identify parameter changes 
that guarantee keeping queries within certain ranges. However, if the belief network has 
specific properties, such as a specific topology, then it is possible for the query to be robust 
against parameter changes that are outside the identified bounds. 

5. Conclusion 

In this paper, we presented an efficient technique for fine-tuning the parameters of a belief 
network. The technique suggests minimal changes to network parameters which ensure 
that certain constraints are enforced on probabilistic queries. Based on this technique, we 
have experimented with some belief networks, only to find out that these networks are 
more sensitive to parameter changes than previous experimental studies seem to suggest. 
This observation leads us to an analytic study on the effect of parameter changes, with the 
aim of characterizing situations under which parameter changes do or do not matter. We 
have reported on a number of results in this direction. Our central result shows that belief 
networks are robust in a very specific sense: the relative change in query odds is bounded 
by the relative change in the parameter odds. A closer look at this result, its meaning, 
and its implications provides interesting characterizations of parameter changes that do 
or do not matter, and explains analytically some of the previous experimental results and 
observations on this matter. 
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Appendix A. Proofs 

Theorem 2.1 The derivative of Pr(e) with respect to the meta parameter r^|u is given 
by: 

dPr{e) Pr(e, x,u) Pr(e,x, u) 

r)T f) ft— ^ 

' x\xi ^x\xi ^X\VL 

when 9x\u / and / 0. 
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Proof Prom Russell et al. (1995), the semantics of the first derivative of Pr{e) with respect 
to parameter 9^^^ is given by:^ 

dPr{e) Pr{e, x, u) 



89. 



x\u 



if ^x\u 0, and: 



dPr{e) Pr{e,x,u) 



if %|u / 0- Because 9^^^ = t^i^ and 9j;\^ = 1 - t^\^, we have: 

dPr{e) dPr{e) dPr{e) 



dr. 



x\u 



9dx\u d(^x\u 
Pr(e,x, u) Pr{e,x,u) 



9. 



x\u 



if ^ and 7^ O.D 

Theorem 3.1 If X is a binary variable in a belief network, then: 



dPr{y I e) 



x\u 



< 



Pr{y I e)(l — Pr{y | e)) 
Pr{x I u)(l - Pr{x I u))' 



Proof Prom Darwiche (2000), the derivative dPr{y \ e)/d9^\^ is equal to: 

dPr{y I e) _ Pr{y, x, u | e) — Pr{y \ e)Pr{x, u | e) 
ddx\u Pr{x I u) 



dPr{y I e) dPr{y \ e) dPr{y \ e) 



dr. 



x\u 



89, 



Since: 

we have: 
8Pr{y I e) 

Pr{y,x,u I e) — Pr{y \ e)Pr{x,u \ e) Pr{y,x,u | e) — Pr{y \ e)Pr{x,u \ e) 

Pr(x I u) Pr(x I u) 

Pr{y,x,u I e) — Pr{y \ e)Pr{x,u \ e) — Pr{x \ u)(Pr(y,u | e) — Pr{y | e)Pr(u | e)) 

Pr(x I u)(l — Pr{x I u)) 

In order to find an upper bound on the derivative, we would like to bound the term 
Pr{y,x,u I e) — Pr{y \ e)Pr{x,u \ e). Since, Pr{y,x,u,e) < Pr{y,u,e) and Pr{y,x,u,e) < 
Pr{x, u, e), we have: 

Pr{y, x,u\ e) — Pr{y \ e)Pr{x, u | e) < Pr{y, x, u | e) — Pr{y \ e)Pr{y, x, u | e) 

= Pr{y. ,'r, u | e)Pr(y \ e) 
< Pr{y, u I e)Pr{y \ e). 



9. Wc allow the notations dPr(e) / d9^\^ and dPr{e) /dO-^^^i^ by assuming Pr{e) as functions of ^j,|u and O^iuj 
even though it is not allowed in belief networks to change only 6x\u or &5|u. 
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Therefore, the upper bound on the derivative is given by: 

dPr{y I e) ^ Pr{y, u | e)Pr{y | e) — Pr{x \ u)(Pr(y, u | e) — Pr{y \ e)Pr{u \ e)) 
dT^\u ~ Prix I u)(l - Pr{x \ u)) 

which is equal to the following term: 

Pr{y I e)Pr{y, u | e) Pr{y \ e)Pr{y, u | e) 
Pr{x I u) 1 — Pr{x I u) 

(1 — Pr{x I u))Pr(y | e)Pr{y, u | e) + Pr{x \ u)Pr{y | e)Pr{y,u \ e) 

Pr{x I u)(l — Pr{x I u)) 
Pr{y,u I e)Pr{y \ e) — Pr{x \ u)(Pr(y, u | e) — Pr{y \ e)Pr{u \ e)) 

Pr{x I u)(l - Pr{x I u)) ' 

Since Pr{y,u \ e) < Pr(y | e) and Pr(y,u \ e) < Pr{y \ e), the upper bound on the 
derivative is given by: 

dPr{y 1 e) 



In order to find a lower bound on the derivative, we note that Pr{y | e) = 1 — Pr{y \ e), 
and thus dPr{y \ e)/(9r^|u = —dPr{y \ e)/dTx\u- Therefore, we can get our lower bound by 
finding the upper bound on the derivative dPr{y \ e)/9T2,|u and multiplying by —1: 

dPr{y I e) ^ Pr{y \ e)(l — Pr{y \ e)) 
9rx\u ~ Pr{x I u)(l - Pr{x \ u)) 
Pr{y I e)(l — Pr{y \ e)) 
Pr{x I u)(l — Pr{x I u)) 

Combining the upper bound and the lower bound, we have: 

dPr{y I e) 

Theorem 3.2 Assume that t^^^ ^ -5 without loss of generality. Suppose that Ar^-iu is an 
infinitesimal change applied to the meta parameter t^\u^ leading to a change of APr{y | e) 
to the query Pr{y \ e). We then have: 

APr{y I e) 
Pr{y I e) 



Pr{y I e)Pr(y,u | e) ^ Pr{y \ e)Pr(y, u | e) 

Pr{x I u) 1 — Pr{x I u) 

Pr(y I e)Pr{y \ e) Pr{y \ e)Pr{y \ e) 

Pr{x I u) 1 — Pr{x I u) 

Pr{y I e)(l — Pr{y \ e)) 
Pr{x I u)(l - Pr{x I u))' 



^ Pr{y I e)(l - Pr{y | e)) 
- Pr{x I u)(l - Pr{x I u))' 



< 2 
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Proof Because At^^^ is infinitesimal, from Theorem 3.1: 



APr{y e) 




dPr{y e) 









^ Prjy I e){l-Pr{y \ e)) 

- Pr{x \ u){l - Pr{x \ u))' 

Arranging the terms, we have: 

APr{y I e) 
Pr{y I e) 



''a;|u 

since Pr{x \ u) = r^|u < .5.n 

Theorem 3.3 Let 0(x \ u) denote the odds of x given n: 0{x \ u) = Pr{x | u)/{l — Pr{x \ 
u)), and let 0{y \ e) denote the odds of y given e: 0{y \ e) = Pr{y \ e)/(l — Pr{y \ e)). 

Let 0'{x I u) and 0'{y \ e) denote these odds after having applied an arbitrary change to 
the meta parameter r^|u where X is a binary variable in a belief network. If the change is 
positive, then: 

0{x I u) 0'{y I e) 0'{x \ u) _ 
0'{x I u) - 0{y I e) " 0{x \ u) ' 

or if it is negative, then: 

0'{x I u) 0'{y I e) 0{x \ u) 
0{x I u) - 0{y I e) " 0'(x | u) ' 

Combining both results, we have: 

I ln(0'(y I e)) - ln{0{y \ e))| < | ln(0'(x | u)) - ln(0(x | u))|. 



< 



1 — Pr{y I e) 



Pr{x 



u 



At, 



a;|u 



1 

< — 

- .5 



Proof We obtain this result by integrating the bound in Theorem 3.1. In particular, if 
we change Tx\u to t'^^^ > Ta,|u, and consequently Pr{y | e) changes to Pr'(y \ e), we can 
separate the variables in the upper bound on the derivative in Theorem 3.1, integrate over 
the intervals, and yield: 

dPrjy I e) ^ Ki^ dT,\^ 

JPr{y\e) Pr{y | e)(l - Pr{y \ e)) ~ Jt^. 

This gives us the solution: 

\n{Pr'{y \ e)) - ln(Pr(y | e)) - ln(l - Pr'{y \ e)) + ln(l - Pr{y \ e)) 
^ In(r^l^) - ln(r^|u) - ln(l - r^|^) + ln(l - t^\^), 
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and after taking exponentials, we have: 

Pr'jy I e)/(l - Pr'jy | e)) ^ - r^, J 

Pr{y I e)/(l - Pr{y \ e)) - r,|„/(l - r,|„) ' 

which is equivalent to: 

0'{y I e) ^ O'jx I u) 
0(2/ I e) - 0(x I u) ■ 

Similarly, wc can separate the variables in the lower bound on the derivative in Theo- 
rem 3.1, integrate over the intervals, and yield: 

Pr'(j;|e) dPr^y \ e) ^ dr^iu 



lpr{y\e) Pr{y | e)(l - Pr{y \ e)) Jr,^^ t^\^{1 - t^\^) ' 

This gives us the solution: 

ln(Pr'(y | e)) - ln(Pr(y | e)) - ln(l - Pr'{y \ e)) + ln(l - Pr{y \ e)) 
> - ln(T^|u) + ln(Ta,|u) + ln(l - t'^\^) - ln(l - r^|u), 

and after taking exponentials, we have: 

Pr'jy I e)/(l - Pr'jy \ e)) ^ t,\J{1 - t,|„) 
Pr{y I e)/(l - Pr{y \ e)) " t'^^J{1 - r^,^) ' 

which is equivalent to: 

0'{y I e) ^ 0(x I u) 



0{y I e) - 0'{x I u) ' 
Therefore, we have the following inequality if r^|^ > Tx]^: 

0{x I u) ^ 0'{y I e) ^ 0'(x | u) 



0'{x I u) - 0{y I e) " 0(a; | u) ' 

On the other hand, if we now change Tx\^x to t'^^^ < r^-iu, we can instead integrate from 
r^|y to r^-iu- The integrals will satisfy these two inequalities: 



Pr'iy\e) Pr{y | e)(l - Pr(y | e)) Jr'^^^ t^\^{1 - t^^J ' 



/pr'(j/|e) Pr{y I e)(l - Pr(y | e)) A;,^ r3,|u(l - r^|u) ' 

We can solve for the inequalities similarly and get the result: 

0'{x I u) ^ 0'{y I e) ^ 0(x | u) 



0(x I u) - 0{y I e) - 0'{x \ u) ' 
Combining the results for both r^|^ > t^^^ and t^|^ < rj.|u, we have: 

I ln(0'(y I e)) - ln(0(y | e))| < | ln(0'(x | u)) - ln(0(x | u))|.n 
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